Frequent Item-set Mining without Ubiquitous Items

نویسندگان

  • Ran M. Bittmann
  • Philippe Nemery
  • Xingtian Shi
  • Michael Kemelmakher
  • Mengjiao Wang
چکیده

Frequent Item-set Mining (FIM), sometimes called Market Basket Analysis (MBA) or Association Rule Learning (ARL), are Machine Learning (ML) methods for creating rules from datasets of transactions of items. Most methods identify items likely to appear together in a transaction based on the support (i.e. a minimum number of relative co-occurrence of the items) for that hypothesis. Although this is a good indicator to measure the relevance of the assumption that these items are likely to appear together, the phenomenon of very frequent items, referred to as ubiquitous items, is not addressed in most algorithms. Ubiquitous items have the same entropy as infrequent items, and not contributing significantly to the knowledge. On the other hand, they have strong effect on the performance of the algorithms and sometimes preventing the convergence of the FIM algorithms and thus the provision of meaningful results. This paper discusses the phenomenon of ubiquitous items and demonstrates how ignoring these has a dramatic effect on the computation performances but with a low and controlled effect on the significance of the results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SaM: A Split and Merge Algorithm for Fuzzy Frequent Item Set Mining

This paper presents SaM, a split and merge algorithm for frequent item set mining. Its distinguishing qualities are an exceptionally simple algorithm and data structure, which not only render it easy to implement, but also convenient to execute on external storage. Furthermore, it can easily be extended to allow for “fuzzy” frequent item set mining in the sense that missing items can be inserte...

متن کامل

Efficient Utility Based Infrequent Weighted Item-Set Mining

Association Rule Mining (ARM) is one of the most popular data mining techniques. Most of the past work is based on frequent item-set. In current years, the concentration of researchers has been focused on infrequent item-set mining. The infrequent item-set mining problem is discovering item-sets whose frequency of the data is less than or equal to maximum threshold. This paper addresses the min...

متن کامل

Study on High Utility Itemset Mining

Data mining is the process of mining new non trivial and potentially valuable information from large data basis. Data mining has been used in the analysis of customer transaction in retail research where it is termed as market basket analysis. Earlier data mining methods concentrated more on the correlation between the items that occurs more frequent in the transaction. In frequent itemset mini...

متن کامل

A Survey of Frequent and Infrequent Weighted Itemset Mining Approaches

Itemset mining is a data mining method extensively used for learning important correlations among data. Initially itemsets mining was made on discovering frequent itemsets. Frequent weighted item set characterizes data in which items may weight differently through frequent correlations in data’s. But, in some situations, for instance certain cost functions need to be minimized for determining r...

متن کامل

An Efficient Data Mining Method to Find Frequent Item Sets in Large Database Using Tr- Fctm

Mining association rules in large database is one of most popular data mining techniques for business decision makers. Discovering frequent item set is the core process in association rule mining. Numerous algorithms are available in the literature to find frequent patterns. Apriori and FP-tree are the most common methods for finding frequent items. Apriori finds significant frequent items usin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018